Exploring the Penn World Tables with R

Introduction

Welcome to an exploration of the Penn World Tables (PWT) using R! In this tutorial, you’ll learn how to read, manipulate, and analyze PWT data.

Prerequisites

Before diving into the analysis, let’s load the necessary R packages. These packages will help us read data files, manipulate data efficiently, and manage file paths easily.

library(haven)    # For reading Stata data files
library(dplyr)    # For data manipulation
library(here)     # For building file paths

Don’t worry about messages regarding function masking; they are typical when multiple packages have similar functions.

Reading the Data

Let’s kick things off by reading the PWT dataset into R. We’ll use the here() function to ensure the path is relative to your project directory.

penn <- read_dta(here("databases/pwt100.dta"))

Take a peek at the dataset to understand its structure and content. We take the top, and then show the first seven columns:

head(penn)[1:7]

# A tibble: 6 × 7
  countrycode country currency_unit   year rgdpe rgdpo   pop
  <chr>       <chr>   <chr>          <dbl> <dbl> <dbl> <dbl>
1 ABW         Aruba   Aruban Guilder  1950    NA    NA    NA
2 ABW         Aruba   Aruban Guilder  1951    NA    NA    NA
3 ABW         Aruba   Aruban Guilder  1952    NA    NA    NA
4 ABW         Aruba   Aruban Guilder  1953    NA    NA    NA
5 ABW         Aruba   Aruban Guilder  1954    NA    NA    NA
6 ABW         Aruba   Aruban Guilder  1955    NA    NA    NA

Let’s also view the last few rows to get a sense of the data’s scope:

tail(penn)[1:7]

# A tibble: 6 × 7
  countrycode country  currency_unit  year  rgdpe  rgdpo   pop
  <chr>       <chr>    <chr>         <dbl>  <dbl>  <dbl> <dbl>
1 ZWE         Zimbabwe US Dollar      2014 37861. 38675.  13.6
2 ZWE         Zimbabwe US Dollar      2015 40142. 39799.  13.8
3 ZWE         Zimbabwe US Dollar      2016 41875. 40963.  14.0
4 ZWE         Zimbabwe US Dollar      2017 44672. 44317.  14.2
5 ZWE         Zimbabwe US Dollar      2018 44325. 43421.  14.4
6 ZWE         Zimbabwe US Dollar      2019 42296. 40827.  14.6

Each column has a variable. Each row shows all values of all variables for a specific country for a specific year.

Analyzing GDP Per Capita

We’ll now focus on analyzing GDP per capita over time for different countries. First, let’s select the necessary variables:

temp <- penn %>% select(countrycode, year, rgdpna, pop)
head(temp)

# A tibble: 6 × 4
  countrycode  year rgdpna   pop
  <chr>       <dbl>  <dbl> <dbl>
1 ABW          1950     NA    NA
2 ABW          1951     NA    NA
3 ABW          1952     NA    NA
4 ABW          1953     NA    NA
5 ABW          1954     NA    NA
6 ABW          1955     NA    NA

Next, we calculate GDP per capita (ypop) by dividing GDP (rgdpna) by the population (pop):

temp$ypop <- temp$rgdpna / temp$pop

Check out the result of this calculation:

tail(temp)

# A tibble: 6 × 5
  countrycode  year rgdpna   pop  ypop
  <chr>       <dbl>  <dbl> <dbl> <dbl>
1 ZWE          2014 41274.  13.6 3038.
2 ZWE          2015 42008.  13.8 3041.
3 ZWE          2016 42326.  14.0 3017.
4 ZWE          2017 44317.  14.2 3113.
5 ZWE          2018 46457.  14.4 3218.
6 ZWE          2019 42694.  14.6 2915.

Plotting GDP Per Capita for the USA

Visualizing data makes it more comprehensible. Let’s create a plot showing the GDP per capita for the USA over time:

temp_usa <- temp %>% filter(countrycode == "USA")

plot(temp_usa$year, temp_usa$ypop /1000, main = "GDP Per Capita in the USA",
     xlab = "", ylab = "Thousands of US dollars",
     type = "l", col = "blue", lwd = 2, las=1)

Calculating Growth Rates

To understand economic dynamics, calculating growth rates is crucial. We’ll first demonstrate an incorrect calculation to highlight common pitfalls:

temp_example <- temp %>% 
  filter(countrycode %in% c("USA", "COL"), year %in% 2006:2009) %>%
  select(countrycode, year, ypop)

temp_example$growth <- 100 * (temp_example$ypop - lag(temp_example$ypop)) / lag(temp_example$ypop)
temp_example

# A tibble: 8 × 4
  countrycode  year   ypop   growth
  <chr>       <dbl>  <dbl>    <dbl>
1 COL          2006 10031.  NA     
2 COL          2007 10575.   5.43  
3 COL          2008 10795.   2.08  
4 COL          2009 10797.   0.0207
5 USA          2006 55484. 414.    
6 USA          2007 55989.   0.910 
7 USA          2008 55382.  -1.08  
8 USA          2009 53480.  -3.43

Note that the growth rate for the USA for 2006 is not correct. This method is flawed because it doesn’t account for changes between countries. Here’s the correct approach:

temp_corrected <- temp_example %>% 
  group_by(countrycode) %>%
  mutate(growth = 100 * (ypop - lag(ypop)) / lag(ypop))

temp_corrected

# A tibble: 8 × 4
# Groups:   countrycode [2]
  countrycode  year   ypop  growth
  <chr>       <dbl>  <dbl>   <dbl>
1 COL          2006 10031. NA     
2 COL          2007 10575.  5.43  
3 COL          2008 10795.  2.08  
4 COL          2009 10797.  0.0207
5 USA          2006 55484. NA     
6 USA          2007 55989.  0.910 
7 USA          2008 55382. -1.08  
8 USA          2009 53480. -3.43

By grouping the data by country, we ensure accurate growth rate calculations.

Conclusion

In this guide, you’ve learned how to work with the Penn World Tables in R. From reading and manipulating data to calculating GDP per capita and growth rates, these techniques are fundamental for economic data analysis.